Overview
This covers the basic hyperparameter tuning of the six models. In the
cases where there are several hyperparameters, like for Random Forest
and Gradient Boosting Tree, further tuning is required to ensure we have
found the near optimal hyperparameters. It would not be computationally
sensible to exhaustively search the hyperparameter space of the ensemble
techniques. Instead the tuning could be done in two phases. The first
phase would be more general and broad while the second phase would be
search a finer scope of values, using the insights from the previous
phase.

DUM - Dummy
GNB - GaussianNaiveBayes
GB - GradientBoosting
KNN - KNearestNeighbours
LSCM - LinearSVC
LG - LogisticRegression
RF - RandomForest
SVM - SupportVectorMachine
Logistic
Regression
Logistic Regression (LG) was trained across three different
hyperparemeters, each relating to regularisation.
- C, is the strength of regularisation, with larger
values indicating a smaller regularisation effect.
- Penalty refers to the four type of regularisation
tested, from None (No regularisation), L1, L2 and ElasticNet (L1 +
L2).
- L1 ratio ratio between L1 and L2 (only applicable
for ElasticNet)
C

Penalty

L1 Ratio

Best Hyperparameters
for each Metric
K-Nearest
Neighbours
K-Nearest Neighbours (KNN) was trained across only two
hyperparameters:
- K (number of neighbours) the number of nieghbouring
points involved in the calculation
- Weight, how the proxomity of these neighbouring
points are weighted (Uniform or proportional to Distance).

Best Hyperparameters
for each Metric
Support
Vector Machine
A variety of Support Vector Machines (SVM) were trained on three
hyperparemeters:
- C, like in logistic regression, this is the
regularisation factor (smaller values means larger penalty)
- Kernel, the core of the SVM either Linear,
Polynomial or a Radial Basis Function (RBF)
- gamma, determines the influence of a single data
point. Larger values mean the points need to be closer to influence each
other.
Note: Gamma only applies to a polynomial or RBF kernel, and
scikit-learn offers two methods of determining the appropriate value
through “Auto” and “Scale” - the maths behind the scenes should be
described in the future.
C

Kernel

Gamma

Best Hyperparameters
for Each Metric
Linear
Support Vector Machine*
*Scikit-learn has two APIs for a SVM, the latter only supports a
linear kernel but offers more methods of regularisation. It is also
reported to be significantly quicker - by an order of magnitude - for
larger datasets.
A Linear SVM (LSCM) does not take in hyperparameters kernel or gamma,
since it is constrained to a linear kernel which does not have a gamma
parameter. However, due to various technical issues, the penalty used
was only L2 instead of including L1 and ElasticNet. The hyperparameters
used were:
- C, the scale of regularisation, larger values
indicate smaller penalties.
- Loss, how an error is determined, Hinge or Squared
Hinge.
C

Loss

Best Hyperparameter
for Each Metric
Random
Forest
Random Forest (RF) is an ensembl technique trains several decision
trees and aggregates across them to form a stronger predictor. RF has
several hyperparameters to test, not all the parameters were selected as
they are not all equally important. The selected few were:
- Number of trees, this is simply the number of trees
trained.
- Max Depth, a common parameter to regularise a
decision tree, to control how many layers it travels down before
terminating.
- Minimum Samples in a Split, the smaller number of
samples allowed when splitting a node.
- Minimum Samples for a leaf node, the minimum
allowed number of samples before a leaf node is created.
- Max Number of Features, determines how many
features to use
Number of Trees

Max Depth

Min Samples
Splits

Min Samples Leaf

Max Number of
Features

Best Hyperparameters
for Each Metric
Gradient
Boosted Tree
Similar to Random Forest, Gradient Boosted Trees (GB) have several
hyperparameters to tune. However, the same parameters for the RF were
used here. GB also has an additional hyperparameter that determines how
the previous generation of trees influence the current tree (i.e the
learning rate).
- Number of trees (n_estimators), this is simply the
number of trees trained.
- Max Depth, a common parameter to regularise a
decision tree, to control how many layers it travels down before
terminating.
- Minimum Samples allowed in a Split, the smaller
number of samples allowed when splitting a node.
- Minimum Samples for a leaf node, the minimum
allowed number of samples before a leaf node is created.
- Max Number of Features, determines how many
features to use
- Learning Rate, larger values indicate a greater
influence from past tree - quicker learning at the cost of decreased
flexibility
Number of Trees

Max Depth

Min Samples
Split

Min Samples Leaf

Max number of
Features

Learning Rate

Best Hyperparameters
for Each Metric